Analytical continuation of imaginary axis data using maximum entropy 
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We study the maximum entropy (MaxEnt) approach for analytical continuation of spectral data 
from imaginary times to real frequencies. The total error is divided in a statistical error, due to 
the noise in the input data, and a systematic error, due to deviations of the default function, used 
in the MaxEnt approach, from the exact spectrum. We find that the MaxEnt approach in its 
classical formulation can lead to a nonoptimal balance between the two types of errors, leading to 
an unnecessary large statistical error. The statistical error can be reduced by splitting up the data in 
several batches, performing a MaxEnt calculation for each batch and averaging. This can outweigh 
an increase in the systematic error resulting from this approach. The output from the MaxEnt result 
can be used as a default function for a new MaxEnt calculation. Such iterations often lead to worse 
results due to an increase in the statistical error. By splitting up the data in batches, the statistical 
error is reduced and and the increase resulting from iterations can be outweighed by a decrease in 
the systematic error. Finally we consider a linearized version to obtain a better understanding of 
the method. 



I. INTRODUCTION 

The analytical continuation of spectral functions from 
imaginary time r to real energies a; is a difficult problem 
due to its ill-posed nature, i.e., the output can depend 
very sensitively on the input. For strongly correlated 
electrons, however, this is an important problem. Most 
approaches for such systems involve uncontrolled approx- 
imations. Using quantum Monte-Carlo (QMC) methods 
or quantum cluster methods it is, however, possible to 
obtain accurate data for Green's functions and response 
functions on the imaginary axis, raising the problem of 
analytical continuation to the real axis. Since these meth- 
ods provide data with statistical noise, the ill-posed na- 
ture of the problem makes analytical continuation very 
difficult. 

This problem can be treated within the Bayesian 
theoryi^ The problem is regularized by introducing an 
entropy in terms of the deviation of the output real 
axis spectrum from some default function. The impor- 
tance of the entropy is controlled by a parameter a, 
which is determined using statistical argumentsi^i^ This 
method is referred to as the Maximum Entropy (Max- 
Ent) method. It has been rather successful in performing 
analytical continuations. Alternative methods have been 
proposed, such as Fade approximations,"^'^ singular value 
decomposition)^ stochastic regularization^ and sampling 
schemes jii^ 

In this paper we focus on the MaxEnt method. This 
method is usually discussed in terms of the Bayesian the- 
ory. Here we start from the equations generated by the 
MaxEnt formalism and use an algebraic approach to an- 
alyze the theory. We discuss the accuracy that can be 
obtained within this framework. The error in the output 
spectral function can be split up in a statistical error, 
due to the noise in the input data, and a systematic er- 
ror, due to the deviation of the default function from the 
true spectrum. The choice of a determines the relative 
size of these errors. In the classical MaxEnt method the 



most probable a is chosen.^ We find that this choice can 
make the statistical error unnecessary large. 

The input data is typically given as a number iVgampio 
of samples, G^{t), where each sample gives a (noisy) 
version of the imaginary time function G{t). We find 
that the accuracy can sometimes be improved by split- 
ting up the samples in A'^caic sub sets (batches), with 
-^sample /^caic Samples in each batch. We then perform 
Alcaic MaxEnt calculations, each with A^sampie/Acaic sam- 
ples, and then average the results, instead of performing 
one MaxEnt calculation A^sampie samples. This approach 
reduces the statistical error at the cost of an increase in 
the systematic error. 

We also discuss the possibility of an iterative MaxEnt 
method, where the output is used to define a new de- 
fault function. This usually works poorly, and we show 
that this is due to an increase in the statistical error, 
overwhelming the improvement in the systematic error. 
However, if the data are split in batches, as discussed 
above, the importance of the statistical error can be re- 
duced to the point where the approach improves the total 
accuracy. 

To further analyze the results, we introduce an alterna- 
tive method with a new, slightly different definition of the 
entropy. This leads to a set of linear equations, where the 
propagation of the errors can be analyzed more easily and 
features of the MaxEnt method better understood. This 
method, however, does not guarantee a positive spectral 
function, and it is less useful for practical calculations. 

In this paper we focus on a response function, the op- 
tical conductivity We introduce a typical criuj), 
which in the following will be refered to as the "exact" 
cr(a;). The form of <j{uj) was chosen using results for the 
two-dimensional Hubbard model as a guide. This model 
of (t(w) can easily and accurately be transformed to imag- 
inary axis data. We add statistical noise to the data and 
then transform the data back to the real axis, using the 
various modifications of the MaxEnt method. If a given 
method worked perfectly the ^{uj) that we started with 
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FIG. 1: (color on-line) "Exact" spectral function and two 
different default models as a function of frequency. The inset 
shows the models on a small energy scale. 



should be recovered exactly. The deviations from the 
"exact" a{ui) are then a measure of the accuracy of the 
different approaches. 

As an example, Fig. [T] shows an "exact" optical con- 
ductivity and two default models used in the MaxEnt ap- 
proach. The optical conductivity has a Drude like peak 
at a; = and a "Hubbard" peak at w ~ 3 corresponding 
to transitions between the Hubbard bands. The default 
models are chosen so that they satisfy the exact sum rule. 
The two models are chosen according to two different 
strategies. It is sometimes argued that the default model 
should contain little information, apart from certain ex- 
act results, such as sum rules. This way the results are 
not prejudiced by possible incorrect assumptions. Model 
1 has been chosen this way. Alternatively, as a default 
model one can use the output from a calculation at a 
higher temperature T. Model 2 has therefore been cho- 
sen to be quite similar to the "exact" result, but with 
all features somewhat broader. Model 2 will naturally 
deliver much more accurate output spectra. 

The paper is organized as follows. In Sec. |ll] we in- 
troduce the formalism. Sec. IIIII describes how a MaxEnt 
calculation is performed as an average of several MaxEnt 
calculations and Sec. IIVI discusses an iterative MaxEnt 
method. In Sec. |V]we present a simplified entropy defi- 
nition, leading to linear equations. 



II. FORMALISM 

We introduce the basic formalism, essentially follow- 
ing Jarrell and Gubernatis,^ and then provide error es- 
timates. The function Gi = G{Ti) for imaginary time 
is related to a spectral function Ai = A{uji) on the real 
frequency axis w, 



l,...Ar^ 



(1) 



via a kernel Kij = K{Ti,ujj), given for some discrete 
values ujj of co. For the case of the optical conductivity, 
considered here, the kernel is given by 



TT 1 — exp(— /Jcjj) 



)fj, (2) 



where fj is a weight factor chosen so that Eq. ([T]) corre- 
sponds to an integral over oj. For the electron Green's 
function the corresponding kernel is 



1 + e-P'^i 
We introduce a likelihood function 
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L 



1 



Gi — Gi 



(3) 



(4) 



where Gi are data obtained from, e.g., a Monte-Carlo 
calculation, with the statistical accuracy CTj, and Gi has 
been calculated from Eq ([T]). We also introduce the en- 
tropy 



A, 



5=2^ fi{Ai -nii- A,ln—), 
=1 



(5) 



where rrii is a default model. The quantity L — aS is then 
minimized with respect Aj. This leads to the equations 



E- 



mi 



(6) 



These equations are solved to obtain the spectral function 
Ai. The quantity a can be determined using statistical 
methods, giving the most probable a. This is referred to 
as the classical MaxEnt methodi Alternatively, one can 
average the spectrum calculated for different values of a, 
using the probability of that a as a weighting functionji 
This method, Bryan's method, gives similar results for 
the cases considered here. 

To estimate the error in this approach, we express the 
calculated spectral function A in terms of the exact result 

^Gxact 



Ai — Aa 



(7) 



where AA^ is the error in Ai. We assume that the error 
is sufficiently small that the logarithm in Eq. © can be 
expanded to lowest order. Then 



^ AG, - AG, 



^cxact 



A 



^) = 0, (8) 



where AG, = K.^AA, and AG, = G,-^^. 

is the error in Gi due to the statistical noise. To solve 

these equations, we define 



EAGiKij 



/I exact 



(9) 
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i=l 



KijKik 



^cxact 3^' 



Using matrix notations, 



AA b-'a ^ b-^K^a-^AG + b-'afln{ 



^cxact ' 



The error w is defined as 



W 



(10) 



(11) 



(12) 



i=l 



where (...) denotes the average over many different real- 
izations of the noise AG^ in the input data. Here Wstat 
is the error due to this noise and Wsyst is the error due 
to the deviation 



Am,- = 



(13) 



of the default function from the exact result. Since the 
noise is random, there is no contribution from the cross 
term in the square of the two terms in Eq. (|lip . The 
statistical error can then be written as 



Wstat 



^ ^{Kb-^fb-^K-),,^) (14) 



The second equahty was obtained by noticing that the 
terms j ^ fc do not contribute on the average and that 
the average of AGf is a'j. Later we consider the average 
over iVcaic MaxEnt calculations, each using data with the 
statistical accuracy a. The statistical error Wstat is then 
reduced by a factor of iVcaic, since Wstat refers to the 
square of the error in the output spectrum. 
For the systematic error we obtain 



f^syst 



ln( 



^exact 



)fab-'fb-'afH 



^cxact 



(15) 



The results in Eqs. ([T?1[T5|) apply to the case when Max- 
Ent calculation is not iterated. For an iterative calcula- 
tion we use Eq. (|22p below. 



Fig. [2] shows Wstat and Wsyst as a function of a for 
different a. The figure illustrates that Wstat behaves ap- 
proximately as 1/a. This illustrates the importance of 
introducing entropy, i.e., using an a > 0. For a — 0, the 
matrix b~^ is ill-behaved and the statistical error would 
be huge. Since Wstat depends only weakly on cr, it is not 
possible to make Wstat small for a = by simply reducing 
a (within reasonable limits). Introducing a > regular- 
izes b and leads to a manageable statistical error. The 
systematic error increases with a and there is therefore 
an optimal value of a where the total error is minimum. 
The dependence of the systematic error on a is shown in 
Fig. [31 It behaves roughly as ^/a. The optimal a there- 
fore increases as a is reduced. It also increases as the 
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FIG. 2: (color on-line) Statistical (wstat) [Eq. ([T4|] and sys- 
tematic (lUsyst) [Eq. (|15l) ] errors for default model 1 as a func- 
tion of Q and for different values of a. The parameters are 
13 = 15 and Nr = 60. 
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FIG. 3: (color on-line) Systematic error (wgyat) [Eq. (|15[l] for 
default models 1 and 2 as a function of a and for different 
values of the a. The straight line shows the curve O.OOlcr^''^, 
illustrating that lUstat is approximately proportional to ^/a. 
The parameters are /3 = 15 and A^'t = 60. 
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FIG. 4: (color on-line) Statistical (wstat), systematic (uigyst) 
and total (w) errors in a MaxEnt calculation for the spectrum 
in Fig.[T] default model 1, a = 40 and /? = 15 and 100 samples, 
each with the accuracy a. The full thick (red) line shows w 
when one (Alcaic = 1) MaxEnt calculation is performed for the 
average of over all data and the the thick broken (blue) line the 
result when Alcaic ~ 5 MaxEnt calculations are averaged, each 
calculation using the average of lOO/A'caic samples. The cross 
corresponds to a historic MaxEnt calculation for a — 0.01, 
which gives a « 40, used in the figure. The thick broken 
(blue) curve illustrates that a substantially lower error can be 
obtained by averaging 5 MaxEnt calculations. 

default model is made more accurate, e.g., by replacing 
default model 1 by model 2. 

Since cj = is often of particular interest we use a 
logarithmic w-mesh. For the case < a; < Wmax we use 

u!i = exp[(i — l)dx + Inj] — 7, (16) 

where dx — [ln(wmax + 7) — ln7]/(A'ij — 1). A small value 
of 7 leads to a smaller spacing of the points close to 
uj — 0. We have typically used N^^ — 121 points, 7 — 0.5, 
■^max — 12, 13 — 15 and — 60. For simplicity, we 
assume that the statistical error is given by 

a, = G,a, (17) 

in terms of some overall accuracy a. To perform these 
calculations we have developed a MaxEnt code, which 
was found to give almost identical results to a code made 
available to us by Jarrellii 

III. MULTIPLE MAXENT CALCULATIONS 

A QMC calculation is arranged so that it gives a num- 
ber of samples, G^{Ti), of G{Ti). From these data one 
can calculate the statistical accuracy Ci , check if the data 
are Gaussian and check for (undesirable) correlations be- 
tween the noise for different values of r.— The data G^{Ti) 
are then averaged over 1^ to obtain G{Ti) that is the in- 
put for the MaxEnt calculation, possibly after removing 
correlations between the noise at different r-pointsji 



TABLE I: Statistical (lUstat), systematic {wsyst) and total (w) 
error in MaxEnt calculations for the spectrum in Fig. [T] and 
default models 1 or 2. 100 samples, each with the accuracy o, 
were split up in Alcaic batches with lOO/A^caic samples and used 
in A'caic calculations. The average of the output was used as 
a default model, performing A'^iter iterations. The errors were 
obtain from Eqs. JTl ^ for Alitor = 1 and from Eq. for 
Mter > 1. The parameters were /3 = 15 and Nt = 60. 
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We now consider the case where we have 100 sam- 
ples, each with the accuracy a. After averaging over 
all the samples, the accuracy of the resulting data is 
ct/a/IOO = cr/lO. The value of a in a classical Max- 
Ent calculation depends on the specific realization of the 
noise. We therefore perform many calculations, each with 
a different realization of the noise, and average over a. 
For o- = 0.01, /3 = 15 and Nr = 60, classical MaxEnt 
calculations using the spectrum in Fig. [T] and the default 
model 1 then gave on the average a sa 40. 

Fig. m shows the statistical and systematic errors for a 
fixed a = 40 as a function of a (iVcaic = 1)- The cross 
gives the total error of a classical MaxEnt calculation 
corresponding to 100 samples with a — 0.01. The figure 
illustrates that the statistical error is much larger than 
the systematic error for the MaxEnt calculation. This 
is also illustrated in Fig. [S^. This shows the result of 
20 MaxEnt calculations with different realizations of the 
noise, each with 100 samples with the accuracy a. The 
thick (red) line shows the exact spectrum. The calculated 
spectra (thin green lines) scatter strongly around the ex- 
act result, illustrating a large statistical error. On the 
average, these spectra also deviate somewhat from the 
exact result, the value of cr(0) being slightly too small 
and the Hubbard peak being somewhat shifted towards 
lower energies, illustrating a small systematic error. 

We next group the 100 samples in iVcaic = 5 batches, 
each with 20 samples, and perform Alcaic MaxEnt calcu- 
lations. The accuracy of the data in these MaxEnt cal- 
culations is then only ^/N\^a/lQ. This increases both 
the systematic and statistical errors somewhat. Aver- 
aging these calculation, however, reduces the statistical 
error by a factor Alcaic- In Fig. |3] this leads to a large net 
reduction in the statistical error, which more than com- 
pensates for the increase of the systematic error. This 
is illustrated in Fig. [SJd, which shows 20 such results, 
each one obtained by averaging Alcaic = 5 MaxEnt cal- 
culations with 100/Acaic samples, but with different re- 
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FIG. 5: (color on-line) Optical conductivity calculated for the 
default model 1 using different methods. 100 samples, each 
with the accuracy a = 0.01 were given, a) Each curve shows 
results of a classical MaxEnt calculation using an average of 
all 100 samples. The figure shows 20 such curves, each corre- 
sponding to a different realization of the noise, b) Each curve 
shows the average of Alcaic = 5 MaxEnt calculations using 
lOO/A^caic samples, c) Each curve shows the results of iterat- 
ing the calculations in a) once, using the output in a) as a de- 
fault function in the next Ma:xEnt calculation, d) Each curve 
shows the results of iterating MaxEnt calculations A'iter = 5 
times. A'caic = 100 was used, and the default function was 
obtained from the average of these A'caic calculations. The 
parameters were 13 — 15, A'',- =60 and a = 40. 
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FIG. 6: (color on-line) The same as Fig. O but starting from 
the default model 2 and using a = 720. 



alizations of the noise. The spread between the curves 
is substantially smaller (wstat = 0.00014 vs. 0.00067) 
than in Fig. [5^, while the systematic error is somewhat 
larger {w^yst = 0.00021 vs. 0.00014). This leads to a 
substantial improvement in the total error {w = 0.00035 
vs. 0.00081). These results are also shown in Table ID 

The reason for this improvement is that that Wstat 3> 
Wsyst in the MaxEnt calculation with iVcaic = 1 and that 
Wstat and Wgyst have different dependencies on iVcaic- For 
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a = 0, Eq. p4)) gives that Wgtat c ■ Splitting up the 
calculation in A'caic calculations makes the effective a a 
factor -y/ -/Vcaic larger, while averaging reduces the error 
by a factor A'caic- The net result would be an unchanged 
statistical error. It is therefore crucial that the method 
has been regularized by introducing an entropy. Fig. 2] 
shows that for realistic values of a, Wstat actually has a 
quite weak dependence on cr, rather than behaving as a^. 
Splitting up the samples in several batches, and thereby 
reducing the accuracy of each batch, leads to a small 
increase in Wstat for each individual calculation. The av- 
eraging over A'caic calculations, however, reduces Wstat by 
a factor A'caic- At the same time Wgyst is increased, but 
only by approximately a factor iV^Yic' since this quantity 
behaves approximately as ^/a and cr increases by a factor 

V-^calc- 

Fig. [5] and Table U show the corresponding results us- 
ing default model 2. In this case the classical MaxEnt 
calculation chooses a value of a that makes Wstat and 
Wsyst comparable. The gain from splitting up the sam- 
ples and performing several MaxEnt calculations is then 
much smaller. 

We are now in the position to discuss the limits of 
accuracy that can be obtained in this approach. We con- 
sider as before 100 samples with the accuracy a — 0.01 
and allow for any combination of a, Alcaic < 100 and 
Niter < 40. Starting from default model 1, we obtain the 
results shown in Fig. [7^. The curve "One calc." shows 
the result of a traditional MaxEnt calculation, using all 
the samples in one calculation (Alcaic = 1 and Mter = !)■ 
If a classical MaxEnt calculation is performed, a w 20 is 
obtained. This result is shown by a cross. We can see 
that this value of a is not optimal, and a larger a would 
have given a smaller error. We next allow for Alcaic > 1 
calculations, each using lOO/TVcaic samples. We find the 
value of Alcaic which gives the best agreement with the 
"exact" (T{Ld). This ("Several opt.") leads to a much 
higher accuracy for small values of a. The curve is al- 
most flat as a function of a over a substantial range. For 
large values of a, Alcaic = 1 gives the best accuracy, and 
the curve falls on top of the curve "One calc." . 

To provide a criterion for how to split up the data 
in batches, we consider the statistical error [(Eq. [Til) ] 
again. As before we consider the case of A^sampio samples, 
each with the accuracy a, divided in A'^caic batches with 
A^sampic/A^caic Samples in each. We define the product 



M((j) 



(18) 



where b also depends on a. The statistical error of the 
kth calculation is then written as 



AA 



(k) 



I -'"calc 

1^=1 



N 



AG 



i^+(k~l)N/N,,i, 



(19) 

where AC is the error in the vth. sample, and the sta- 
tistical accuracy a-N^^i^ = f/ v^V/iVcaJc enters due to the 
averaging over A^/A^caic samples. We then average over 
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FIG. 7: (color on-line)Accuracy w of MaxEnt calculations for 
100 samples, each with accuracy cr. "One calc." uses the aver- 
age of all samples in one MaxEnt calculation. "Several opt." 
and "Several est." split up the samples in several batches and 
averages the resulting MaxEnt calculations. "Several opt." 
does this in the optimal way and "Several est." uses a pre- 
scription for finding the splitting when the exact result is not 
known. "Iterated" in addition uses the output spectral func- 
tion as default model in an iterative approach. The cross 
shows the result of a classical MaxEnt calculation, a) shows 
results for default model 1 add b) for default model 2. The 
parameters are 13—15 and Nt = 60. 



the A'caic calculations and obtain the error 



1 ^ _ 



(20) 



The average difference between two calculations with 
Alcaic and Afcaic batches can then be written as 

WMN = Y^[A^{Ncalc) - Ai{Mc^,lc)fwi = (21) 

i 

Mij {a N^^i^ Alcaic Mcalc )^ Mcalc 12 



calc 



calc 



This result represents an average over many different re- 
alizations of the error A&''\ In addition to the sta- 
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tistical contribution to the difference there is a system- 
atic contribution due to the error in the default function. 
We then compare calculations with iVcaic = -^sample and 
Mcedc = -^sample/ 2 batchcs. In the second calculation 
the statistical error is larger and the systematic error is 
smaller. If the total difference between the two calcula- 
tions is larger than twice the expected statistical differ- 
ence, this suggests that the gain in the systematic error 
outweighs the loss in the statistical error and the second 
calculation is accepted. We then compare this calcula- 
tion with a calculation with A^sampie/4 batches and if the 
latter is favorable the procedure is continued, consider- 
ing iVsampio/lO, iVsampio/20 and iVsampio/50 batchcs. The 
resulting accuracy is shown by the curves "Several est." 
in Fig. [T] This curve is above the curve "Several opt.", 
but the difference is not very large for most values of a. 



IV. ITERATING MAXENT 

Once a MaxEnt calculation has been performed, one 
can try to improve the default function by using the out- 
put spectral function as a new default function. Such an 
iterative approach, however, is usually not recommended. 
Fig. [S}: shows the results of such calculations using de- 
fault model 1. Indeed, the spread between between dif- 
ferent calculations is larger than in the noniterated case 
in Fig. [S^, implying an increased statistical error. 

Eqs. (ITil [T5)) used to calculate the statistical and sys- 
tematic errors for a noniterated default model are not 
appropriate in the case of iterations. The reason is that 
the default model in this case contains statistical errors 
due to the iteration procedure. Instead we perform many 
calculations N of the type shown in Fig. [St, giving spec- 
tral functions , v = 1, .., N . We then calculate 



Wstat 



i/=l 
. N ... 



N 



N - 



(22) 



u=l i=l 



w.y.t = E(^r - ^r^*)V. 



Due to nonlinearity, some of the statistical error actually 
shows up as a systematic error in Eq. (l22t . but this is 
neglected in the following. 

The results in Table HI shows that the statistical er- 
ror is more than doubled after one iteration, while the 
systematic error is not correspondingly reduced. 

We next consider the case when the samples are split 
up in A'caic batches and default model 1 is used. Using 
-^caic = 100 and iVitcr = 4 the total error is reduced, as is 
illustrated in Fig.[li and Table H By using TVcaic = 100, 
we drastically reduce the statistical error. The following 
iterations increase the statistical error by a substantial 
factor, but it nevertheless remains small. At the same 



time the iterations reduce the systematic error, so that 
both are improved compared with the noniterated case. 
Fig. mi shows similar results using default model 2. Since 
this model is very close to the exact result, iterations 
now lead to a small improvement, but even with such an 
accurate default function there is an improvement. In 
Fig. H the curve "Iterated" shows results when iteration 
is allowed (Mtor < 40). This leads to a substantial im- 
provement in the accuracy. Fig. shows similar results 
for default model 2. 



V. QUADRATIC ENTROPY 

In Eq. 1^ in Sec. |lT] we introduced the entropy, con- 
taining a logarithm. As a result, the basic equations of 
MaxEnt are nonlinear, which makes the analysis compli- 
cated. For this reason, we introduce a new definition of 
the entropy, which is used in this section for analyzing 
the behavior of MaxEnt. The expression in Eq. ([5]) is 
expanded to lowest (second) order in the deviation be- 
tween the solution and the default function. We then 
define this as the entropy, and use it in this section. This 
is then not an approximation but simply a new method. 
This method has some problems. For instance, it is not 
guaranteed that the spectrum is positive. Therefore we 
do not recommend the use of this method for calculating 
spectra, but simply use it to analyze MaxEnt. 

We define the entropy 



1=1 



TO, 



(") 



(23) 



where we have allowed for the possibility of the MaxEnt 
calculation being iterated, i.e., m^") depends on the iter- 
ation n. The original default function is m^^\ This leads 
to the equations 



6(") = K^a-^K 



(24) 



where matrix notations have been used and af and 
af/m'-^^^ are diagonal matrices. Then = 
[(6(")]-ia. The error in is 

AA("+i) = [6(")]-M^^a"'AG + a/^^^], (25) 

to'-"-' 

where Ato^") = to(") - ^'^'^act^ ^gg^^g 




M - 



(26) 
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and a similar definition for Sm^"'^ as for SA'-^\ 
SJ'^^i"'']^ contains the same integration factor fi as has 
been used earlier, but due to the factor l/m^"^ it gives 
more weight to errors where m*^"^ is small. We have 

^^("+1) = c(")[5g(") + (27) 

We find the eigenvalues ei^"' and eigenvectors |^*^"-') of 
the symmetric matrix c*-"-* . Introducing the expansion in 
these eigenvectors, SA'iJ^^ — (j/'^"' |(Sy4'^")) we obtain 

= ei") [(5G(,"' + &m^^\ (28) 
The matrix c can be rewritten as 

For the cases we have considered, the first matrix inside 
the bracket has a broad range of positive eigenvalues, ex- 
tending from eigenvalues much smaller than one to much 
larger than one. As a result, the matrix c^"^ is found to 
have some very small eigenvalues and many eigenvalues 
very close to one. This is illustrated in Table [TTl which 
shows the lowest eigenvalues for the default model 1 and 
a = 40. 

Fig. [5] shows the eigenfunctions corresponding 
to the lowest eigenvalues in Table HI The lowest func- 
tion is nodeless, and the higher functions have an in- 
creasing number of nodes. Functions with the eigenvalue 
very close to one oscillate so rapidly that the correspond- 
ing components of (5ml."'' and JgI"-' tend to have small 
weights, as shown in Table HIl As a comparison. Table HIl 
also shows the expansion coefficients of the default model 
2 in the eigenfunctions obtained for the default model 1 
and a = 40. 

It is crucial for the success of a MaxEnt calculation 
that the coefficients bmi^ and bG')f^ are typically small 
for el"'' close to one. From Eq. (|28p it follows that errors 

brtih^ and (SgI"'' corresponding to eigenvalues much 
smaller than one give a strongly reduced contribution 

to the error bJ^v''^^^ , while errors corresponding to the 
eigenvalue one are not reduced at all. For these compo- 
nents the deviation of the default model from the true 
result are taken over completely. 

At the same time this sets the limits for MaxEnt calcu- 
lations. A MaxEnt calculation fails if A{u)) has structures 
on such a small energy scale that there are important ex- 
pansion coefficients JAI"^^-* corresponding to eigenvalues 
close to one, since the MaxEnt calculation gives no ad- 
ditional information about these components. This also 
shows the danger of putting in too much structure on a 
small energy scale in the default function. This would 
make components bmi^ corresponding to £,y « 1 im- 
portant and the MaxEnt calculation would not remove 
them from J^^'^^^ ^ even if there is no support for such 
components in the data. 



The results in Figs. [5] and [6] show a beating pattern, 
where the different calculations agree approximately for 
certain values of w. This must be related to the noise 
in the input data, since this is what differs between the 
calculations. The reason can be seen from Table |ll] and 
Fig. [51 The contribution of the noise to the output is 
given by SiySG^J^K This contribution comes mainly from 
the eighth and ninth eigenvalues. The corresponding 
eigenfunctions in Fig. [5] have their zeros approximately 
where the deviations between the calculations in Fig. [5] 
are small, although the agreement is not perfect. The 
reason is probably the nonlinearity due to the logarithm 
in Eq. (fTTj) . For instance, if the logarithm is expanded 
to second order, the resulting product of two functions 
generates functions with more nodes than either of the 
two functions. As a result we find that (5^1"+^^ has ap- 
preciable errors also for components with a few more 
nodes than the eighth and ninth eigenfunctions. This 
then shifts the beating pattern slightly towards lower en- 
ergies. 

We introduce the projection operator 

p(") = ^ I |e(eo - 4"^), (30) 

V 

where the 8-function selects states with eigenvalues 
smaller than eo < 1- Eq- can now be iterated. If 
we assume that el"'' is independent of n, which is a good 
approximation, we obtain 

^^(„+i) ^ I YJitl e^JG^v^ + e"+^i5ml"^ for < Eq 
[ ei,[(5Gl°^ -I- (5ml°^] for e^, > eq 

(31) 

This illustrates how iteration reduces the systematic er- 
ror for components with < £o, but increases the statis- 
tical error. Whether iteration pays off then depends on 
the relative size of the statistical and systematic errors 
and the choice of Eq. In this linearized version, however, 
it docs not pay off to include all states in the projection 
operator (leading to P^"^ = 1). 

For the nonlinear case, the behavior is a bit different. 
From the expression for the error in Eq. (fTTj) . it follows 
that In (m/A"'^^'^*) enters. Expanding the logarithm leads 
to terms with products of eigenfunctions of the type in 
Fig. [51 Such products couple to higher eigenfunctions 
with more nodes. The result is that the error of a certain 
i/-component of \n{m / A°^'^'^*') depends not only on the 
error of that z^-component of m but also on the errors 
of other components, in particular lower ones. Whether 
the errors from the different contributions add construc- 
tively or destructively depends on the specifics of the 
model. For the cases we considered the contributions to 
the higher components often add destructively. Then it 
can be more favorable to iterate all components rather 
than just the ones that would be favorable according to 
Eq. pTjl . For the cases we have studied, this has usually 
been the case and this is the approach we used in Sec. lIVI 
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TABLE II: Lowest eigenvalues of the matrix c'"' [Eq. (I26|) ] and the corresponding amplitudes Jmi"' (1) and SCi^^ for default 
model 1. The values of Sm^,^^ (2) for default model 2 (expanded in the functions corresponding to default model 1) and the 
expansion coefficients of o-(aj) are also shown. The larger eigenvalues are all close to unity, the amplitudes of the corresponding 
SG^J^^ are all smaller than 2 10"*. The corresponding values of Sm'^^^ are also fairly small, smaller than 0.1 for default model 1 
and smaller than 0.02 for default model 2. We used /3 = 15, Nuj = 121, cjmax = 12, a = 40 and a = 0.001. 





.3 10"'' 


.2 10"^ 


.6 10"^ 


.30 10-'' 


.2 10-3 


.002 


.016 


.166 


0.747 0.982 


0.999 


1.000 


1.000 


<5m<,"' (1) 


-.92 


-.81 


-.28 


-1.7 


1.1 


-.01 


-1.1 


-.17 


.31 


-.08 


-.33 


-.15 


.007 


Smi°'> (2) 


.80 


-.32 


.06 


-.08 


.08 


.03 


-.19 
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.07 


.03 
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-.007 




3.1 
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FIG. 8; (color on-line) Eigenfunctions to the matrix c^"' in 



Eq. (|26p corresponding to the 10 lowest eigenvalues. The fig- 
ure illustrates how eigenfunctions corresponding to an eigen- 
value close to one oscillate very rapidly. 



VI. SUMMARY 

We have analyzed the MaxEnt approach for analyti- 
cal continuation, defining a statistical error, due to noise 
in the imaginary axis input data, and a systematic error, 
due to errors in the default function entering the entropy. 



The classical method for choosing the weight a of the en- 
tropy can lead to a nonoptimal choice, reducing the sys- 
tematic error at the cost of making the statistical error 
unnecessarily large. We find that the statistical error can 
be reduced by sphtting up the data in batches. A Max- 
Ent calculation is performed for each batch and the result 
is averaged. This approach increases the systematic error 
but the total error can be reduced. We have also studied 
an iterative approach, where the output spectrum is used 
as default function in a new MaxEnt calculation. We find 
that a straightforward apphcation of this approach often 
gives worse results due to a rapid increase of the sta- 
tistical error. By splitting up the data in batches, the 
statistical error can be reduced sufficiently that this is 
less serious. The reduction of the systematic error can 
then outweigh the increase of the statistical error. 



To analyze MaxEnt method, we have studied a lin- 
earized version of the problem. In this formalism it is 
easier to see how the statistical error propagates, in par- 
ticular in the case of iterations. One can also see how 
certain deviations of the default function from the exact 
result have little influence on the output, while others 
fully show up in the output. This illustrates the danger 
of having a default function with too much structure. 



While this paper shows the potential for improving the 
MaxEnt method, it is harder to provide prescriptions for 
how to use this. In Sec. lIIII we provided a prescription for 
how to split the data in batches, which we have found to 
often work fairly well for a give value of a. This method 
makes the resulting error less sensitive to the optimiza- 
tion of alpha. Alternatively, one can simply split the data 
in, say 10, batches. For each batch the classical method 
of determining a is used and the resulting MaxEnt results 
are averaged. This approach typically improves the ac- 
curacy of the output spectrum. In particular, it reduces 
the risk of finding spurious structures due to overfitting 
of noisy data, while some real structures can be lost in 
this approach. 
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